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Abstract 



pH , In this work we use the tensorial language developed in [S] and [H] to differentiate functions 

of eigenvalues of symmetric matrices. We describe the formulae for the A;-th derivative of 
such functions in two cases. The first case concerns the derivatives of the composition of an 
arbitrary differentiable function with the eigenvalues at a matrix with distinct eigenvalues. 
The second development describes the derivatives of the composition of a separable symmetric 

■ function with the eigenvalues at an arbitrary symmetric matrix. In the concluding section 
s * . we re-derive the formula for the Hessian of a general spectral function at an arbitrary point. 

Our approach leads to a shorter, streamlined derivation than the original in [§]. The language 
■^j- ' we use, based on the generalized Hadamard product, allows us to view the differentiation of 

■ spectral functions as a routine calculus-type procedure. 
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1 Introduction 



We say that a real-valued function F, on a symmetric matrix argument, is spectral if it has the 
following invariance property: 

F{UXU T ) = F(X), 

for every symmetric matrix X in its domain and every orthogonal matrix U. The restriction of 
F to the subspace of diagonal matrices defines (almost) a function f(x) := F(Diaga;) on a vector 
argument It is easy to see that / : W 1 — * K has the property 

f(x) = f(Px) for any permutation matrix P and any x G domain/. 
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We call such functions symmetric. It is not difficult to see that F(X) = f(X(X)), where X(X) 
is the vector of eigenvalues of X. An important subclass of spectral functions is obtained when 
f(x) = g(xi) + - ■ • + g(x n ) for some function g on one real variable. We call such symmetric functions 
separable and their corresponding spectral functions will be called separable spectral functions. 

In j3] an explicit formulae for the gradient the spectral function F in terms of the derivatives 
of the symmetric function / was given: 



position of X. In jB] a formula for the Hessian of F was given, whose structure appeared quite 
different than the one for the gradient. In this work we generalize the work in [3] and [B] by proving 
the following formula for the k-th derivative of a spectral function 



where again X = ^(Diag \{X))V T . The sum is taken over all permutations on k elements. (The 
role of the permutations is just as a convenient tool for enumerating the maps A a (x).) The precise 
meaning of the operators Diag 17 , generalizing the Diag operator, is explained in the next section, 
see Formula (J3J). The main thing to keep in mind about the formula is that the maps A a (x) depend 
only on the partial derivatives of f(x), up to order k, and do not depend on the eigenvalues. In 
this sense the process of differentiating / o A leaves the eigenvalues unscathed, since the only way 
in which they participate in the formula above is through the compositions A a (\(X)) and the 
conjugation by the orthogonal matrix V. 

We show that Formula (J2J) holds in two general subcases. It holds when / is a fc-times differ- 
entiable function, not necessarily symmetric, and X is a matrix with distinct eigenvalues. It also 
holds when / is a separable symmetric function and X is an arbitrary symmetric matrix. We give 
an easy recipe for computing the maps A a (x) in the above two cases. 

In addition, we show that in the case when / is a fc-times continuously different iable, separable, 
symmetric function, Formula (J2J) can be significantly simplified. In that case, all the maps A a (x) 
coincide, that is A ai (x) = A tT2 (x) for any two permutations <7i,cr 2 on k elements. 

Finally, in the last section, we re-derive the formula for the Hessian of a general spectral function 
at an arbitrary symmetric matrix. Our approach leads to a shorter, more streamlined derivation 
than the original derivation in [H]. 

The language that we use, based on the generalized Hadamard product, allows us to differentiate 
Formula (J2J) just like one would expect: writing the differential quotient and taking the limit as the 
perturbation goes to zero. This gives a clear view of where the different pieces in the differential 
come from and give the process a routine Calculus-like flavour. 

In the next section, we give all the necessary notation, definitions, and background results to 
make the reading of this work self-contained. 
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2 Notation and background results 



We use pretty much the same notation as in the preceding two papers jH] and jH]. We will briefly 
summarize it here for completeness and will try to make the reading of this part independent. 

By S n , O n , and P n we will denote the set of all n x n real symmetric, orthogonal, permutation 
matrices respectively. By M n will be denoted the real Euclidean space of all n x n matrices with 
inner product (X,Y) = ti(XY T ). For A £ S n , X(A) = (\i(A), X n (A)) will be the vector of 
its eigenvalues ordered in nonincreasing order. By Nk we will denote the set {1, 2, k}. For any 
vector x in W 1 , Diagx will denote the diagonal matrix with the vector x on the main diagonal, and 
diag: M n — > W 1 will denote its conjugate operator, defined by diag (AT) = (in, ...,x nn ). By R? we 
denote the cone of all vectors x in M. n such that x\ > X2 > • • ■ > x n . Denote the standard basis in 
M. n by e 1 , e 2 , e n . For a permutation matrix P £ P n we say that a : N n — > N n is its corresponding 
permutation map and write P <-> a if for any /i £ R" we have Ph = (h a m, h a ^) T or, in other 
words, P T e l = e CT W for all i = 1, n. The symbol Sij will denote the Kroneker delta. It is equal to 
one if i = j and zero otherwise. 

Any vector fi £ R n defines a partition of N n into disjoint blocks, where integers i and j are in 
the same block if, and only if, fa = fij. In general, the blocks that fi determines need not contain 
consecutive integers. We agree that the block containing the integer 1 will be the first block, the 
block containing the smallest integer that is not in l\ will be the second block, I 2 , and so on. The 
number r will denote the number of blocks in the partition. Let i\ denote the largest integer in Ii 
for all / = 1, r. For any two integers, i,j £ N n we will say that they are equivalent (with respect 
to fi) and write % ~ j (or % ~ M j) if fa = fij, that is, if they are in the same block. Two fc-indexes 
(ii, ifc) and (ji, ...,jk) are called equivalent if i\ ~ ji for all / = 1, 2, n, and we will write 

(ii, ...,i k ) rsj (j u ...,jk)- 

A k-tensor, T , on M n is a map from IR n x • • • x R™ (/c-times) to M that is linear in each argument 
separately. Denote the set of all /c-tensors on IR n by T k,n . The value of the /c-tensor at (hi, h^) 
will be denoted by T[h\, hk]. The tensor is called symmetric if for any permutation, a, on 
Nfc it satisfies T[h a {\), — ^fc], for any hi,...,hk £ M n . Given a vector fi £ M n , 

a tensor T £ T fc,n is fi-symmetric if for any permutation P £ P™, such that Pfi = /i, we have 
T[Phi, Phk] = T[h±, hk], for any hi, hk £ M n . A Axtensor valued map, fi £ M n — > ^"(/i) £ 
T k,n , is \i-symmetric if for every /z £ R n and permutation matrix P we have T[P[i)\Phi, Phk] = 
J-(fi)[hi, hk], for any hi,...,hk £ M n . The tensor is called is called [i-block- constant if T ll '" lk = 
Th-h whenever (ii,...,ik) ~ (ji, ■■■,jk)- A fc-tensor valued map, /i £ M n ' — > J-'(fi) £ T fc,ri , is 
block- constant if ^(/i) is /i-block-constant for every /i. Clearly, every /x-block-constant tensor is 
^-symmetric. By T[h] we denote the (k — l)-tensor on R n given by T[-, ■, h]. Similarly for T[M], 
if T is a /c-tensor on M n and M £ M n . The following easy lemma was proved in j^j. 

Lemma 2.1 If a k-tensor valued map, fi £ R™ — > T(/i) £ T k,n , is fi-symmetric and differentiable, 
then its differential, VT(fi), is also fi-symmetric. 

For each permutation a on we define a-Hadamard product between k matrices to be a 
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fc-tensor on M n as follows. Given any k basic matrices H Piqi , H P2q2 ,...,H Pkqk 



(H n ff n ...n U \his-ik — J ^' ^ s ?ct(s)>Vs 1,...,/c, 

^ pigi cr ^ P292 °a °. «p fc «J " | Q) otherwise . 

Extend this product to a multi-linear map on k matrix arguments: 
(3) (H, o a H 2 o a ... o a H k y^ = H l ^- lw ■ ■ ■ H^ k) . 



Notice that when k = 1 we have ° {1) H = diagif. Let T be an arbitrary /c-tensor on IR n and let a 
be a permutation on 1%. We define Diag CT T to be a 2/c-tensor on M n in the following way 



(4) (Diag'T)*"** 



T h - ik , if i s = j CT(s ),Vs = 1, fc, 
0, otherwise. 



When k = 1 we have Diag (1) x = Diagx for any x G M. n . Any 2/c-tensor, T, on M n can naturally be 
viewed clS cL fc-tensor on M n in the following way 



n n 

Pl-Pk 



T[H h ...,H k ]= • • • J2 T^-- qk Hf iqi ■ ■ ■ H Pkq \ 

Pl,qi = l p k ,q k =l 

Define dot product between two tensors in T k,n in the usual way: 



(T 1; T 2 )= £ 7f- p *T; 
pi,...,p fe =i 



pi---Pfe 
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We define an action (called conjugation) of the orthogonal group O n on the space of all /c-tensors 
on M. n . For any fc-tensor, T, and [/ G O n this action will be denoted by UTU T G y fc ' n : 

n n 

(5) (uTu T ) h - lk = J2"'Y1 (r pi - Pk u hpi ■ ■ ■ U ikP A. 

Pl=l Pfc=l 

We showed in jHj that this action is norm preserving and associative: V{UTU T )V T = (VU)T(VU) T 
for all U, V G O n . 

The Diag CT operator, the cx-Hadamard product, and conjugation by an orthogonal matrix are 
connected by the following formula, see jH]. 

Theorem 2.2 For any k-tensor T, any matrices H\,...,H k , any orthogonal matrix V, and any 
permutation a in P k we have the identity 

(6) (T,H 1 o a ...o a H k } = (^(Diag^T)^)^!,...,^], 
where Hi = V T HiV , i — 1, k. 
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We will also need the following lemma from [H]. 

Lemma 2.3 Let T be any 2k-tensor on R n , U G O n , and let H be any matrix. Then, the following 
identity holds. 

U(T[U T HU])U T = {UTU T )[H\. 

Given a permutation a on we can naturally view it as a permutation on f%+i fixing the last 
element. Let n be the transposition (I, k + 1), for all I = 1, k, k + 1. Define k + 1 permutations, 
(T(/), on Nfc+i, as follows: 

(7) a (i) = an, for / = 1, k, k + 1. 

Informally speaking, given the cycle decomposition of a, we obtain am, for each / = 1, by 
inserting the element + 1 immediately after the element /, and when I — k + 1, the permutation 
cr(fc + i) fixes the element k + 1. Clearly cr7^(k + 1) = I for all /, and 

{All permutations on N^+i} = {err; 1 a is a permutation on / = 1, k,k + 1}. 

For a fixed vector fi G R™ we define fc linear maps 



T G T fc ' n -> T ( £ G T fc+1 ' n , for / = 1, 2, fc, 



as follows: 



{0, if i« ~ 
rpi 1 ...i l _±i k+1 i l+1 ...i k _ rpi 1 ...i l _ 1 i l i l+1 ...i h 
— , if s< 

Notice that if T is a //-block-constant tensor, then so is T Q ^ t for each I = 1, fc. The next theorem 
is Corollary 5.8 from 

Theorem 2.4 Let {M m } be a sequence of symmetric matrices converging to 0, such that M m /\\M m \\ 
converges to M . Let \i be in R? and U m — > U G O™ &e a sequence of orthogonal matrices such that 

Diag/x + M m = [7" m (DiagA(Diag/i + M m ))U^, for all m = l,2, .... 
T/ien /or every block- constant k-tensor T on ~R n , and any permutation a on Nk we have 

lim MM = £ (Diag ^o )[M] . 

II m|| /=1 

Again, for a fixed vector fi G M n , we define /c linear maps 

T G T k ' n -> T^ G T fc+1 ' n , for Z = 1, 2, fc, 
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as follows: 

^rpTiyi—ikik+l _ 

Notice that if T is a block-constant tensor, then so is T£ for each I = 1, k. Finally, we define 



0, if 2; ^ ijfc+i. 



■ • ( r pii-..ii_ 1 i l i l+1 ...i k ■£ ■ _ ■ 

^ ' " \ 0, if ii 7^ ifc+i. 

In other words, T n is a (A; + l)-tensor with entries off the hyper plane i\ = i^+i equal to zero. On 
the hyper plane %\ = ik+i we have placed the original tensor T. The next theorem is Corollary 5.6 
from [Hj. 

Theorem 2.5 Let U G O n be a block- diagonal orthogonal matrix and let a be a permutation on 
Nfc. Let M be an arbitrary symmetric matrix, and h el™ be a vector, such that U 7 M- m U = Diag/i. 
Then 

(i) for any block- constant {k + 1) -tensor T on W 1 , 

U(Bi8,g a (T[h]))U T = (Diag^+i) T)[M}] 

(ii) for any block- constant k-tensor T on M. n 

U{pmg a {T Tl [h]))U T = (Diag CT WT^)[M], for all I = 1, 

where the permutations a^, for I G are defined by ffl). 



3 Several standing assumptions 

Our approach is to successively differentiate the composition / o A where at every step we use the 
tensorial language presented in Section |2] to simplify the calculation. More precisely, we will define 
/c-tensor valued maps A a : M n — > T k,n , a G P k , (only in terms of the function / and its partial 
derivatives) such that 

(12) V fc (/ o A)(X) = V( Diag CT X(A(X)))\/ T , 

creP k 

where X = V(Dia,g\(X))V T . The formula for the gradient (the case k = 1) was originally derived 
in [3], see also Subsection 15. II below. We showed in [SJ Section 5], that having derived that formula 
for k = 1, then for k > 2 it is enough to show it under the following three assumptions. 

• The matrix X is diagonal, Diag/i, for some vector /i G M". 

• The sequence {M m } of symmetric matrices converges to and is such that M m /||M m || con- 
verges to M. 
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• A sequence of orthogonal matrices U m G O n is chosen such that 

Diag/i + M m = [y m (DiagA(Diag/i + M m ))U^, for all m = l,2, .... 

and U m approaches U G O n as m goes to infinity. ([/ is block diagonal with blocks determined 
by /i.) 

The next lemma (the proof is a simple combination of Lemma 5.10 in [5] and Theorem 3.12 in 
[3]) justifies the notation that follows. Recall that [i G M n partitions N„ into r blocks Ii,...,I r . 

Lemma 3.1 For any //el? and sequence of symmetric matrices M m — * we /ia^e £/ia£ 

(13) A(Dia g/U + M m f = /i T + (A(X 1 T M m X 1 ) T , A(X r T M m X r ) T ) T + o(||M m ||), 
where X[ := [e* | z G ij], /or all I = 1, ...,r. 

Throughout the whole paper, we denote 

(14) h m := (A(X 1 T M m X 1 ) T , X(XjM m X r ) T f . 

If also M m /||M m || converges to M as m goes to infinity, since the eigenvalues are continuous func- 
tions, we can define 

(15) h := hm = (\(X?MX 1 ) T , \{X?MX r ) T f . 

m^oo ||A2 m || 

We reserve the symbols h m and h to denote the above two vectors throughout the paper. With this 
notation Lemma 13. II savs that if M m — > 0, then 

(16) A(Dia gy u + M m ) T = fi T + h m + o(\\M m \\). 
If, for the fixed vector // e M™, we define 

M p f U' J - if ' - ./• 
111 [0, otherwise, 

then Theorem 4.2 in [H] says that the orthogonal matrix U is block-diagonal and satisfies 

(17) U T M in U = Diag/i. 

4 Analyticity of isolated eigenvalues 

Let A be in S n and suppose that the j-th largest eigenvalue is isolated, that is 

X^tiA) > \ 3 (A) > VuW 

The goal of this section is to give two justifications of the known fact that Aj(-) is an analytic 
function in a neighbourhood of A. We call a function of several real variables analytic at a point 
if in a neighbourhood of this point it has an power series expansion. The corresponding complex 
variable notion is called holomorphic. 

The first justification below is from ^lj Theorem 2.1]. 
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Theorem 4.1 Suppose f : R n — > R zs a function analytic at the point X(A) for some A in S n . 
Suppose also f(Px) = f(x) for every permutation matrix, P, for which PX(A) = X(A). Then, the 
function f o X is analytic at A. ■ 



To see how this theorem implies the analyticity of Xj(-) take 

f(xi, ...,x n ) = the j th largest element of {x\, ...,x n }. 

The function / is a piece-wise affine function. Moreover, for any x G R n in a neighbourhood of the 
vector A (A) it is given by 

f{x) Xj. 

Thus, / is analytic in that neighbourhood. Next, / is a symmetric function and thus by definition 
f(Px) = f(x) for every x G R n and every permutation matrix P. Therefore by the theorem 
Xj = f o A is an analytic function. 

For the second justification we use the following result £Q. (In the theorem below, Xi(X) 
denotes an arbitrary eigenvalue of a matrix X, not necessarily the i'th largest one.) 

Theorem 4.2 (Arnold 1971) Suppose that A G C nxn has q eigenvalues Xi(A), X q (A) (count- 
ing multiplicities) in an open set flcC, and the rest n — q eigenvalues not in the closure offl. Then, 
there is a neighbourhood A of A and holomorphic mappings S : A — > C gxq and T : A — > (£( n -<]) x ( n - c i) 
such that for all X £ A 

X is similar to 

and S(A) has eigenvalues Xx(A), X g (A). ■ 



S(X) \ 
I T(X) J 



To deduce the result we need, since the j largest eigenvalue is isolated, we can find an open set 
ficC, such that only that eigenvalue is in Q and the remaining n — 1 are not in the closure of Q. 
By the theorem, there is a neighbourhood A of A and holomorphic mapping S : A — > C such that 
S(X) is equal to the j th largest eigenvalue of X for all X in A. 

If A is a real symmetric matrix, then the intersection of A with S n is a neighbourhood of A in 
S n . Let S(X) denote the restriction of S(X) to A PI S n . Clearly, S(X) is holomorphic, real valued 
function. Therefore, (it is a standard result in complex analysis) the coefficients in the power series 
expansion of S(X) must be real numbers. Thus, the j th largest eigenvalue is a real analytic function 
in the neighbourhood A n S n or A. 

All these considerations make the following observation clear. 

Theorem 4.3 Suppose that A G S n has distinct eigenvalues and f : R n — >• R is k-times (continu- 
ously) differentiable in a neighbourhood of X(A). Then, foX is k-times (continuously) differentiable 
in a neighbourhood of A. 
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5 The k th derivative of functions of eigenvalues at a matrix 
with distinct eigenvalues 

Let / : R n -> 1 be an arbitrary fc-times (continuously) different iable function. In this section, we 
do not assume that / is a symmetric function. Our goal in this section is to derive a formula for 
the k th derivative of / o A on the set A - 1 (0) , where 

Q = {x G R n | Xi 7^ Xj for every % ^ j}, and 
A" 1 ^) = {Ae S n | X(A) e n}. 

Clearly Q is a dense, open subset of R n and A -1 is a dense, open subset of S n . 

As an example of how one can differentiate / o A, let us consider the general situation. Let X, 
Y, and Z be Banach spaces and let g : X — > Y, G : Y — > Z. Then, by applying the chain rule we 
have the following formulae for the first three derivatives of <fi = G o g, (see |2J Section X.4]) for any 
vectors h±,h2, from X: 

V^x)[h 1 ] = VG(g(x))[Vg(x)[h 1 ]], 

V^{x)[h u h 2 M = V 3 G(^(x))[V( ? (x)[/i 1 ], Vg(x)[h 2 ], Vg(x)[h 3 ]] 
+ V 2 G(( ? (x))[V( ? (a;)[/i 1 ], V 2 g(x)[h 2 , h 3 }] 
+ V 2 G(g(x))[Vg(x)[h 2 ],V 2 g(x)[h 1 ,h 3 }] 
+ W 2 G(g(x))[Vg(x)[h 3 ],V 2 g(x)[h 1 ,h 2 ]] 
+ VG(g(x))[V 3 g(x)[h 1 ,h 2 ,h 3 ]]. 

In our case, we have X = S n , Y = R n , Z — R, g — A, and G = f. As can be seen from the 
above example, this approach very quickly becomes unmanageable. The formula for the fc-derivative 
of the composition requires formulae for every derivative of A up to the k th . It is not clear how one 
can organize and simplify the resulting expression into a compact, ordered formula. 

Fix a vector \x G R? PI Q. Since \x has distinct entries, every block in the partition that it 
defines will have exactly one element. This means that for any j, i G N n , i ~ j ^ i = j, and that 
makes any tensor block-constant. In particular for the matrices Xi, defined in Lemma f3. 11 we have 
X\ = [e l ], I = 1, ...,n. This implies that h m = diagM m and that h = diagM. Notice, finally, how 
the definition of T®t changes: 



(l) \h...i k i k+1 



re) 



0, if %i = ik+i, 



if ii ^ Zfc+i- 



We will derive Formula ()12j) by induction on the order of the derivative. For completeness, we begin 
by recalculating the formula for the gradient. 
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5.1 The gradient 

Using Formulae (fTH|) we compute 

Um if Q A) (Diag /i + M m ) - {f o A) (Diag /i) = ^ f(fj, + h m + o(\\M m \\))-f(fj) 

m^oo LM m || m— >oo ||iW m || 

f(n) + Vf(n)[h m ] + o(\\M m \\) -/(//) 



lim 

(V/(/i),diagM) 
(DiagV/(/i))[M]. 



(71 



This shows that V(/ o A)(Diag/i) = Diag (1) V/(/i). It is easy to see now that 

(19) V(/ o A)(X) = V(Diag^V/(A(X)))V T = u( £ Diag Vt CT (A(X)))v^ 



aeP 1 

tT 



where X = V (Dia.g\(X))V and A(\)(x) = V/(x). Trivially, if / is fc-times (continuously) differ- 
ent iable, then An)(x) = V/(x) is (k — l)-times (continuously) differentiable. 

Note that when the eigenvalues of X are not distinct, the calculation of the gradient of / o A 
is almost identical and leads to the same final formula. Indeed, using Equation (fT7j) . 

V/(/i)[/i] = (V/(/i),diag(f/ T M in f/)> = (f/(DiagV/(/i))f/ T )[M] = (Diag V/(^)) [M], 

where in the last equality we used the fact the U is block-diagonal, orthogonal and V/(/x) is block- 
constant. 

5.2 The induction step 

Suppose now that for some 1 < s < k 



V s (/ o A)(X) = V{ DiagM CT (A(X)))y 3 



aeP s 

where X = V (Diag X(X))V T . Suppose also that for every a G P s , the s-tensor valued map 
A a : W n — > T s,n , is (k — s)-times (continuously) differentiable. 

Using Formulae (|16[). we differentiate V s (/ o A) at the matrix Diag/i: 

V s+1 (/o A) (Diag fi)[M] 

Hm V'(/ o A) (Diag// + M m ) - V s (f o A) (Diag /x) 

m— >oo ll-^mll 

U m(Y,aeP° Diag VI, (A (Diag /i + M m )))U^ - £ CTeps Diag a A a ([i) 
= hm — — 

m->oo \\M m \\ 
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E. e p* {Um (Diag (A(Diag /i + M m ))) - Diag "AM 
lim 



lim 



lim 



\\M m \\ 

E ffe p s (^ m (DiagM CT (/i + /i m + o(||M m ||)))C^ - Diag VU/i) 
E. e p s (f/ m (Diag ff (AM + VAM\hm] + o(||M m ||)))££ - Diag CT A» 

\\M m \\ 

T 



m— >oo * — ' iW m * — ' 

ae ps M ™ll ff£ p^ 

By Theorem 12.41 since for every cr £ P s , the tensor A tT (fi) is block-constant, we have 
Um C / m (D i a E M,(.))^ -D 1 a g M g (,) = £ (Diag , m( ^ ((l))£t)[M] 



By Theorem 12.51 since for every a £ P s V*4 CT (/i) is a block- const ant (s + l)-tensor, we have 

U(piag°(yA ff (^)[h]))U T = (Biag^V AM)[M}. 

Thus we define 

A m : = (AM)oL for all / £ N s , and 
A a := Vi ff (/i). 

Putting everything together and conclude that for every symmetric matrix M: 

V s+1 (/oA)(Diag/i)[M] = ( Diag^)X (i) (/i))[M]. 

a£P s 

Notice the parameters of the summation sign in the above formula. As a goes over the elements 
of P s and as I goes over the set N s+ i the permutation am covers, in a one-to-one manner, all 
permutations in P s+1 . Now, the comments in [HI Section 5] show that 



V* +1 (/ o \)(X) = V(J2 Diag CT «X (!) (A(X)))^ 



reP s 



where X = ^(Diag \{X))V T . 

To finish the induction, we have to show that the (s + l)-tensor valued maps A a(l) {-) are at 
least (k — s — l)-times (continuously) differentiable. This is clear when I — s + 1 and a £ P s , since 
v4. cr (-) is (k — s)-times (continuously) differentiable for every a £ P s . For the rest of the maps this 
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is also easy to see. Every entry in Aa-m is the difference of two entries of A a divided by a quantity 
that never becomes zero over the set Q. This shows that over the set Q, A a , t A-) is (k — s)-times 
(continuously) differentiable for every a G P s and every I G N s . 
We summarize everything in the next theorem. 

Theorem 5.1 Let X be a symmetric matrix with distinct eigenvalues and let f be a k-times 
(continuously) differentiable function on M. n . Let V be an orthogonal matrix such that X = 
V(Di&g X(X))V T . Then, f o A is k-times (continuously) differentiable function at X . Moreover 
tf^ s {f ° A), for some s < k, is given by 

V s (/ o X)(X) = V[ DiagM CT (A(X)))\/ r , 

ctGP s 

for some s-tensor valued mappings A a : W 1 — > T s ' n , for every a G P s , then V^" 1 " 1 ^/ o A) is given 
by 

(20) V^(/oA)(X) =y( J2 Diag^X (!) (A(X)))\/ r , 

creP s 

«eN s+1 

where 

Aa m = {Aa)® t , for all I G Ns, and 
A = VA 



6 The k th derivative of separable spectral functions 

In this section we show that Formula (|T2*|) holds at an arbitrary symmetric matrix X (not necessarily 
with distinct eigenvalues) for a subclass of spectral functions that we now describe. 

Let g be a real function on an interval /. If D = Diag (Ai, X n ) is a diagonal matrix with 
diagonal entries A.; G I, % — 1, ...,n, we define 

(21) G( J D) = Diag(^(A 1 ),...^(A 1 )). 

If X is a symmetric matrix with eigenvalues \ in /, we choose an orthogonal matrix V such that 
X = 1/(DiagA(X))V T and, then define 

(22) G(X) = VG(Di&g\(X))V T . 

In this way we obtain a (well-defined) symmetric-matrix valued function with domain the set of all 
matrices X with eigenvalues in I. 

These functions have been the object of recent interest in optimization and the main 
object of (21 Chapter V], where their gradient is computed using an approximation argument. 
Notice that G(X) is just the gradient (see Formula ()19j0 of the spectral function / o A, where 
f(x) = g(x\) + ■ ■ ■ + g(x n ), and g(s) = J* g(t) dt. That is why we will call those functions separable 
spectral functions. 
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6.1 Description of the k th derivative 

Let g : I — > M be fc-times (continuously) differentiable. Define the symmetric function g^ 12 ^(x, y) : 
/ x / -> R as 

9 m) \x,y)={ x-y ' * X * V > 
g'(x), iix = y. 

The integral representation , y) — f g'(y + t(x — y)) dt shows that , y) is as smooth, 

in both arguments, as g' . 

Denote by P k the set of all permutations from P k that have one cycle in their cycle decompo- 
sition. Clearly \P k \ = (k — 1)!. Notice that for every a £ P k and every / £ N k we have am £ P k+1 . 
Moreover, as a varies over P k and / varies over N k , the permutation am varies over P k+1 in a 
one-to-one and onto fashion. 

Suppose that for every a £ P k we have defined the function ...,x k ) on the set / x / x 

■ • • x J, /c-times, and suppose that these functions are as smooth as g( k ~^ (the [k — l)-th derivative 
of g). For every a £ P k and every / £ N k we define the function g^^\x\, x k , x k+ \) as follows: 

Vig [(T] (xi, ...,x k ), tfxi=x k+ i 
(24) g^(x u ...,x k+1 ) = I gW(x 1 ,...,x h ...,x h )-g^(x ll ... 1 x h¥1 ,...,x M ) 



Xi - X k+ i 



if xi ^ x k +i, 



where in the second case of the definition, both xi and x k +i are in Z-th position, and V/ denotes the 
partial derivative with respect to the Z-th argument. Using the integral formula 

g [(r W ] (x 1 ,...,x k+1 ) = / Vig [a] (xi,...,xi- 1 ,x k+1 + t(x t - x k+1 ),xi +1 , ...,x k ) dt, 
Jo 

for every I £ ~N k , we see that g^^(xi, x k+ \) is as smooth as g^ k \ the fc-th derivative of g. 
Finally, for every s£{2,3,...,/c + l} and every a £ P s , we define a s-tensor valued map 

g W : w 1 -> T s ' n , where 
(25) . . 

(gW(Ai)) ,1 -" w :=y H W,-,AH.). 

Clearly, if „.,i a ) ~ M (j l5 j s ), then (g [<T] (/i)) n '"* 3 = (gW(jj)) 31 "' 3 ' , which shows that g M (/i) is a 
/^-block-constant tensor for every ji. Moreover, the map : lR n — > T s,n is still as smooth as g^ s ~ l \ 
for every s — 2, 3, k + 1. 

We are now ready to formulate the second main result of this work. The proof is given in 
the next subsection. A comparison between Theorem 16.11 and Theorem 15.11 is given at the end of 
Subsection 16.^ 
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Theorem 6.1 Let g be a k-times (continuously) differentiable function defined on an interval I. 
Let X be a symmetric matrix with eigenvalues in the interval I , and let V be an orthogonal matrix 
such that X = V (Di&g \(X))V T . Then, the matrix valued function G defined by \21)) and \22}) is 
k-times (continuously) differentiable at X . Moreover its k-th derivative, V k G(X), is given by the 
formula 

(26) V k G{X) = v( K DiagV ] (A(X)))y 

aeP k+1 

where the (k + l)-tensor valued maps g^(-) are defined by Equation \25\) . 
6.2 Proof of Theorem I6.lt the gradient 

Let X be an n x n symmetric matrix with all eigenvalues in / and such that X = V(Diag X(X))V T 
for some orthogonal matrix V. The formula for the gradient of separable spectral functions has 
been known for a while. For example, using approximation techniques, it was shown in [2] that for 
any two symmetric matrices Hi and H2 

(27) VG(X)[H lt H 2 ] = (V{g^(X(X)) o (V T H 1 V))V T , H 2 ), 

where ! o' stands for the usual Hadamard product. 

In this subsection, we will give a direct derivation of the gradient and as a result a slightly 
different representation of the above formula. 

For convenience we denote g(x) := (g(xi), g(x n )) T , and Vg{x) := (g'(xi), g'(x n )) T , for 
any x G M n . Thus we compute: 

V7^m- ujui r <^(Diag/i + M m ) - G(Diag^) 
VG(Diag^) [M] = hm — — 

m— >oo ll^mll 

f/ m Diag^(A(Diag/i + M m ))U^ - Diag#(/x) 
= hm 

m— >oo \\M m \\ 

Hm t/ m (Diagff(/x + h m + o(\\M m \\)))Ul - DiaggQu) 



lim 



\\M m \\ 

U m (Dmgg(fi) + (Diag Vg^))[h rr ] + o(\\M m \\)))Ul - Diag_g(/i) 

IIMJI 



lim M^mK - DhggQO + y ((Dhg Vgmh])u r 



It is important to notice that both vectors g(fi) and Vg(/i) are block-constant (with respect to /i). 
We use the second part of Corollary 12.51 with a = (1), k = I = 1, (notice that = (12)) and 
T = Vg(fi) to develop the second term above: 

U((ptogVg(ri)[h])V T = (Diag< 12 HVs(/i)C)[M]. 



14 



We now use Corollary 12.41 with k — 1, a — (1), and T = g(/j,) to find the limit: 

£7 m (Diag#(>))[/^-Diag£(/i) , wiaman 

Putting everything together we get 

VG(Diag/i) = Diag^V^))^ + Diag ^(g(^ = Diag ^V 12 ^), 

where we used the easy to check fact that g^ 12 ^(fi) = (Vg(/i))jn^ + (g(/i))ou^- Now, using LemmaEI 
it is easy to see the following result (when X is arbitrary symmetric matrix, not just diagonal). 

Theorem 6.2 Let g e C 1 (-^) and let X be a symmetric matrix with all eigenvalues in I . Then, 

(28) VG(X) = V(Bi ag ^g^(X(X)))V T , 

where X = ^(Diag \(X))V T . 

For the sake of completeness, we show that Formula ()28|) is indeed the same as Formula ()27|) . 
This is achieved when in the next result one substitutes the matrix A with the matrix g^ 12 ^(X(X)). 

Proposition 6.3 For any n x n matrix A, any orthogonal V , and any symmetric Hi and H 2 , we 
have the equality 

{V{m & g^A)V T )[Hi,H 2 \ = (V(Ao (V T HiV))V T ,H 2 ), 
where 'o ' stands for the ordinary Hadamard product. 

Proof. We develop the two sides of the stated equality and compare the results. By Theorem 12. 2| 
the left-hand side is equal to 

^(Diag^V^ifi,^] = (A,Hi ° (12) H 2 ). 

On the other hand 

(y^o^^jy 7 ,^) = (AoH u H 2 ) = (A,H 1 oH 2 ). 

Finally it is easy to check directly from the definitions that i?i o H 2 = Hi o H^ = Hi o H 2 , where 
in the last equality we used that H 2 is symmetric. ■ 
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6.3 Proof of Theorem I6.lt the induction step 

Suppose that g : / — > K is fc-times (continuously) differentiable, and that the formula for the 
(s — l)-th derivative (2 < s < k + 1) of G at the matrix X is given by 



V (S " 1} G(X) = V(J2 Diag CT g w (A(X)))v r . 



The s-tensor-valued maps g^ : M. — > T s,n are at least (fc — s + l)-times (continuously) differentiable. 
As we explained in Section |HJ it is enough to derive the formula for V S G(X) only in the case when 
X = Diag/i for some \i G W\. We compute: 

V G (Dia grt [M]= lim V^g(g!^ + M m ) - V^Diag^) 



1 1 Ml .. 

^(E ff6 p. Diag CT gH(A(Diag/i + M ro )))t£ - £ CT6 p s Diag'gM^) 

lim 



m— >oo 



^(E^ Diag*gM(/* + fr m + (||M m ||)))t£ - J2 aePs Diag*gM(/*) 



lim 

m— »oo M, 



M E CT£ p s Diag CT (gH(/i) + VgW(^) W + o(||M m ||)) )t£ - E CTe p fc Diag CT gH(/i) 
lim 

m->oo ||-Mre|| 

U m (Z„eP* Diag CT gH(/i))^ - £ ffe# . Diag CT gH(/i) 
Km ^ ^ + C/( Diag CT (VgW(/i)[/i]))f/ 



o-eP s 

First, using Theorem 12.41 we wrap up the limit in the above formula: 

U m ( £. e p* Diag CT g W 0*)) t£ - E, e p, Diag CT gH (/.) 
(29) lim — ^ -f— = £ Diag CT co( g M (/i ))« [M] . 



crGP s 



Next, we focus our attention on the gradient Vg' "'^). Using the definition, Equation (J25|) . we see 
that 

v[(gW(^r-^] = x>tf[*w...,AO^ 

(30) <f 

where for the second equality we used Equation ((211) • This prompts us to define the s-tensor-valued 
map 

T z : R n -> T s ' n , where 
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(31) 

(7K//))* 1 -" :=^(/i il ,..,^.,^ l ). 
for every Z e N a . Notice that T;(/i) is a /i-block-constant s-tensor, for every [i and every 1gN s . 
Lemma 6.4 T/ie gradient o/g' CT '(/x) allows the following decomposition 

s 

(32) VgW(/i) = ^(r,(/i)) r! , 

Z=l 

where the "lifting" (?](//)) ! is defined by Equation ill)) . 

Proof. Fix a multi index (ii, ...,i a ). By definition of the gradient Vg'°"l(/i) we have that 

((VgM(Ai))*-"^^ 
We compute the p-th entry in the above vector. On one hand, using Equation (J3UJ). we get: 

s 

(VgW(Ai)) il -*" = X;^ (,)I (A*ii.-.^,A**,)- 
z=i 

On the other, using Equation (jllj) . we evaluate the right-hand side of (1321) : 

(^:(^ I M)'')"'" i "''=i:(( ^ 'M) ,, ) i, ■■■'•' i, 



Z=l 1=1 



Yl { t i(p)) %1 '" % ' s *ip 
i=i 

i=i 

k=v 

s 



i=i 

k=V 



We now continue the evaluation of V s G(Diag fi)[M]. Using Theorem 12.51 in the last equality 
below, we find that 

u( J2 Dia s CT ( v 9 { °\m))u T = u(Yl ^{{i2( T ^) Tl )[h]))u T 
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a£ ps 1 = 1 



S 



o-GP 3 1=1 

CT6 ps 1=1 

(33) ^Eh^^wi:)^]' 

CTg ps / = 1 

This already shows that V < - s_1 - ) G'(Diag /i) is different iable. All that is left to do, now, is to show that 
V s G(Diag/i) has the desired form and properties. The last step is formulated in the next lemma. 

Lemma 6.5 For every I G N s the following identity holds 

g^i0i) = (r l 0i))^+( g W( AI ))W. 

Proof. Fix a number I G N s and a multi index (ii, i s , i 8 +i). We consider two cases depending 
on whether or not fJ>i a+1 equals fi ir 

Case I. Suppose i\ ~^ i s +i- Then, the entry of the left-hand side, corresponding to the multi 
index (z'i, ...,i a ,i a+1 ) is 

On the other hand, the right-hand side evaluates to 

= (Wf^ 

= V^ M (/i n ,...,/i is ), 

where in the third equality we used Equation (jlUj) and the fact that Tj(/x) is block-constant. 

Case II. Suppose ij 7^ i s+1 . Then, the entry of the left-hand side, corresponding to the multi 
index (z'i, ...,i s ,i s+1 ) is 

_ fi 1 ^ (/^ii ) •• •) A*i( ) •• •) A*i s ) 5^ (/^ii ) • • • ) A^s+i ' " ■' t^is ) 
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where both and fii s+1 are in position I. On the other hand, the right-hand side evaluates to 



fi 1 ' ' (/^ii ) ■ • • ) /^s+l ' ' " " ' A^s ) ~~ 9 ^ {fMi ) • • • ) fJ>i t ) • • • ) A'is ) 



In both cases, the two sides are equal and we are done. 



Putting Equations (|29~|l and (|3*3*|) together, and using Lemma 16.51 concludes the inductive step 
and proves Theorem 16.11 

In the special case when matrix X has distinct eigenvalues, it seems that Theorem 15 . II and The- 
orem l6.ll give two different formulae for the higher-order derivatives of a separable spectral function. 
We now reconcile the differences. Suppose we have the formula for V S G(X) given by Equation (|2"6*j) 
and apply to it the inductive procedure described in Theorem 15. II to obtain V^ S+1 ' ) G(X). The cal- 
culations in Subsection 16 . 31 showed that the gradient A a , s+r) = V^4 CT can be partitioned into s pieces 
(Lemma 16. 4|) and each piece can be added as an s-dimensional "diagonal plane" (Lemma 16.5)1 to 
a corresponding tensor A a{l) for I G N a . Doing that, we will arrive at the formula for V (s+1) G(X) 
given by Theorem 16.11 



6.4 C k separable spectral functions 



Theorem 16. II holds for every fc-times differentiable functions g. If in addition g in /c-times continu- 
ously different iable, then Formula ()26|) can be significantly simplified. This is what we will describe 
in this section. In particular, we will show three properties of the functions g^(xi, x s ), for every 
2 < s < k + 1 and every o G P s . First, we will give a compact determinant formula for comput- 
ing g^(xi, x s ) directly. Second, as a consequence of the determinant formula we will see that 
g^(x\, x s ) is a symmetric function on its s arguments. Finally, third, denoting a s = (12. ..s), 
all functions gW(xi, ...,x s ) can be obtained from g^ 3 \x\, ...,x s ) by a permutation of its arguments. 
(Thus, knowing one of the tensors in Formula (|26|). namely g' "^//), we can obtain the rest by 
permuting its "rows" and "columns".) 

Denote by V(x±, ...,x s ) the Vandermonde determinant 



V{x u ...,x s ) 



-i 

L 2 



Xi x 2 
1 1 



xi 



Xs 
1 



~\{Xj Xi 



j<i 



19 



For any y £ W, denote by V(%\]Z% 3 S ) the determinant 



v(£::::;8:) 



2/1 2/2 
„s-2 s-2 



Xl x 2 

1 1 



Vs 
s-2 



J s 
1 



Lemma 6.6 For any vector (x±, x s , x s+ \) with distinct coordinates, any y £ W +1 , and I £ N s 
the following identity holds 



V(xt, ...,x s ) V(xt, x s+ i, x i+1 , x s ) 



t/ ( yi,—,yi-i,y s +i,yi+i,—,ys \ 

V \x 1 ,...,x l _ 1 ,x s+1 ,x l+l ,...,x s ) 



{xi - x 8 +i) 



T//2/1,— ,vi,y s +i,yi+i,— ,y s 

V {x 1 ,...,xi,x s+1 ,xi +1 ,...,x s 



V( Xl , 



Proof. We consider both sides of the above identity as a multivariate polynomial in the variables 
yi, y s , y s +i and show that the coefficients in front of y\. on both sides are equal for all k £ N a +i. 
Notice first that 

V( Xl , ...,x/_i,x s+ i,x z+ i, ...,x s ) = (—l)'~ l V(x lt ...,x l - 1 ,xi +1 , ...,x s ,x s+1 ), 



1/ ( yi,---,yi-i,Vs+i,yi+i 

V \x 1 ,...,xi_ 1 ,x s+1 ,x t+1 



>— >3/s ' 



i \s—h/ ( yi>—>yi-i>yi+i 

-i-j V \xi,...,Xl_l,X l+1 



)-")3/s,J/3 + l > 

x a ,x s+ i) 



We consider four cases according to the partition N s+ i = {1, I — 1} U {1} U {1 + 1, s} U {s + 1}. 
(In all product formulae below, it will be assumed that the index j < i. This conditions is omitted 
for typographical reasons. Also a hat on top of a multiple in a product denotes that the multiple is 
missing.) First, let k £ {1, ...,/ — 1}. The coefficient in front of y k in the expression on the left-hand 
side is equal to 



, ^ fc+ i rit,3gN s+ Ufc,s+i}( 3; J Xi "> _ ^_pk+i^-hjen s+ \{k,i}( x J X i) 



Ili,jGN s+ \{s+l}[ X j X * 



-iy 

(-!)*+! 



rii,jeN a+1 \{/}( x j x i 



(a?i - x k ) - ■ ■ (x k -i - x k )(x k - x k+1 ) ■ ■ ■ (x k - x s ) 



1 



(Xi - X k ) ■ ■ ■ (x k -l - X k )(x k - Xfc+i) • • ■ (x k - Xi) - ■ ■ (x fc - X s+ i) 

(-l) k+1 ( 1 1 



(Xi -X k ) ■ ■ '{Xk-l - X k )(x k - X k+ i) ■ ■ -{X k - Xi) ■ ■ -(x k - X s ) \ X k x l x k x s+l 

(-l) fc+1 (x f -x 8+ i) 

(xi - x k ) ■ ■ ■ (x fc _i - x fc )(x fc - x fc+ i) • • • (x fe - x s+1 ) 



(-l) fe+1 (x, - x s+1 ) 



nijeN s+ i\{fc}( x i x i) 



n 



ij'eN s+ i 



Xj Xi) 
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which is the coefficient in front of y k on the right-hand side of the identity. 

Suppose now, k = I. Then, the coefficient of yi in the left-hand side of the identity is 



ILjeN. + M«+l}( X i - X i) ( X l - X l) ■ ■ ' 0*1-1 - X l)( X l - X W) ■■■( X l- X s) 

(-l) m (^-^ + l) 



Oi - Xi)--- - xi)(x t - x l+1 ) ---(xi- x s+i ) 
1 U jeN s+ i v x J x v 

When G {i + 1, s}, the coefficient in front of y k in the left-hand side of the identity is: 

, fc+ i riij£N 3+1 \{fc,s+i}( x i - x i) , 1 ^ + 2ni,jeN 3+1 \{fc,z}( x i ~ x 



]_^fe+l x ^J^f±MZIlZl£ _ ( — 1) 



rijjeN s+ i\{s+l}( X i X i) Yli,j£N s+ \{l}( X j X i) 

(-l) k+1 

(X! - X k ) ■ ■ ■ (X k -i - X k )(x k - X k+ i) ---{X k - X s ) 



(xi - x k ) ■ ■ ■ {xi - x k ) ■ ■ ■ (x fc _i - x k ){x k - x k+ i) ---{x k - x s+x ) 

/ 1 1 



+ 

(Xi - X k ) ■ ■ ■ (x t - X k ) ■ ■ ■ (X k -i - X k )(x k - X k+1 ) ---{x k - X s ) \ X l ~ X k X k~ x s+l 

(-l) fc+1 (x,-x g+1 ) 

(xi - x k ) ■ ■ ■ - x k )(x k - x k+1 ) ■ ■ ■ (x k - X s+ i) 

(-l) fe+1 (^-x s+1 ) n ^ eN ^ w(Xi_Xi) 



rii,jeN a+ i( x i x i) 

which is the coefficient in front of y k on the right-hand side of the identity. 

Finally, when k = s + 1 the coefficient of y s +i in the left-hand side of the identity is 

1 ^_ i IlijGN i+ MM+i}( x J ~ Xi } _ (-l) s + 2 



0-(-l)' +1 (-l) 



riijeN^iX^}^' Xi ) {xi - x s+1 ) ■ ■ ■ (xi - x s+1 ) ■ ■ ■ (x s - x s+1 ) 

(-i)' +2 fa-s, + i) 

(X! - X s+1 ) ---(x s - X s+ i) 
~{-L) [Xi-X s+ i)^= _ 



which is again the coefficient of y s +i on the right. 
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Theorem 6.7 Suppose g G C k (I). Then, for every permutation o G P s , 2 < s < k + 1, and every 
vector (xx, ...,x s ) with distinct coordinates, we have the formula 



(34) 



\ X!,...,X a ) 



V(x h ...,x s ) 

In particular, g^(xi, x s ) is symmetric everywhere in its domain. 

Proof. The proof is by induction on s. When s = 2 and x\ ^ X2, then by the definition we have 
the representation 



,[(12)] 



l l 


V(g(x 1 ),g(x 2 );x 1 ,x 2 ) 




X\ X 2 

1 1 


V(x 1 ,x 2 ) 



Suppose Representation (JMj) holds for s, 2 < s < k + 1. Fix a permutation a G P s and an 
/ G N s , then G P s+1 . Let y = (g(xx), g(x s ), g(x s+ i)). Using Definition (J2"2j) for any point 
(xi, x s , x s+ x) with distinct coordinates together with Lemma 16.61 and the induction hypothesis, 
we get 



1 



<?H( Xl; 



^i-l; ^s+l; ^i+l) 



~~ x s+l 

\/fyi,—,ys\ \r ! yi,-,vi-i,ys+i,vi+i,-,Vs \ 

V yxi^.^Xs) V \X 1 ,...,Xl_ 1 ,X B+ l,X l+1 ,...,X s ) 



(xi-X s+ i) \ V(X!, ...,X S ) V(X!, ...,Xi-i,X s+1 ,Xi +1 , ...,X S ) 

xrfvu— ,yi:V3+i,yi+i,— ,y s \ 

V \xi,...,xi,x s +i,xi +1 ,...,x s ) 



V{x\, ...,xi, xi+i, ...,x s ) 
\r (yi,--,y s +i\ 

V \xi,...,x s+ i) 



V(x u ...,x s+1 )' 

Since P s+1 = {a^ \ a G P s ,l G N s } the induction step is completed. Finally, using the continuity 
of g^(xi, ...,x s ) shows that it is a symmetric function everywhere on its domain. ■ 



(35) 



Now, we simplify Theorem 16.11 significantly. Define the (k + l)-tensor valued map 

g . R« _, T fc+l,n where 

y p(Wi),-,9(w fe+1 h 

V Mil + 1 / 



(g(A*)) 



Technically, this definition is good only at /x's with distinct coordinates, but Lemma 16.71 shows 
that it can be extended continuously everywhere. Clearly, if (ii, i^+i) ~^ (ji, ■■■,jk+i), then 
(g(/^)) ll '" ife+1 = (g(/w)) Jl "' Jfe+1 , which shows that g(/i) is a /i-block-constant tensor for every ji. 
Moreover, g(/i) is a symmetric tensor, and the map g : W l — > T k+l,n is continuous. 
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Theorem 6.8 Let g be a C k function defined on an interval I . Let X be a symmetric matrix with 
eigenvalues in the interval I, and let V be an orthogonal matrix such that X = V(Di&g \(X))V T . 
Then, the matrix valued function G defined by (HTJj and 1%%)) is C k at X. We have the formula 

(36) V k G(X) = v( Yl Diag CT g(A(X)))v T , 

aeP k + 1 

where the (k + 1) -tensor valued maps g(-) is defined by Equation \35)) . 

The next corollary is a generalization of Formula (V.22) from j2]. It is a specialization of the 
last theorem to the case when k = 2. Since G is a symmetric matrix valued function, the second 
derivative V 2 G(Diag/z)[ifi, H 2 ] can be viewed as a symmetric matrix. For every % — 1, ...,n, define 
the projection onto the z-th coordinate axis 

Pi : W l -> W l 

H%{p^) • 

Corollary 6.9 For g G C 2 (I) and any n x n symmetric matrices Hi, H 2 , H 3 we have 



(V 2 G(X)[H h H 2 ], H 3 ) = 2 Yl s{KX)) PlP2Ps H PlP3 H P2Pl H 

Pl,P2,P3=l 
n,n,n 

V 2 G(X)[H 1 ,H 2 } = 2 £ g(A(X))™P Pl if 1 P P2 i7 2 P J 



1>3P2 

3 ) 



P3> 

Pl,P2,P3 = l 

w/iere X = ^(Diag A(X))V T , and Hi = V T HiV , i = 1,2,3. 
Proof. Suppose first that X = Diag \x for some \x G 

(v 3 G(Diag/i)[fri,fr 3 ],fr 3 > = v 2 G(Di& gf i)[Hi,H 2 ,H 3 ] 

= (^Diag^g^))^,^,^] 

crGP 3 

= J] <g(//), Hio a H 2 o a H 3 ) 

o-GP 3 

= (g(ju), H x o {123) # 2 o (123) if 3 ) + (g(», H x o {132) iJ 2 o {132) H 3 ) 

n,n,n n,n,n 

= M PlP2P3 H PlP3 H P2Pl H P3P2 + g(ti) qiq2q3 Hf iq2 Hl 2q3 Hl 3gi . 

P1,P2,P3 = 1 91:92,53 = 1 

After re-parametrization of the second sum (p x = q 2 , p 2 = g 3 , p 3 = 91), and using the fact that g(/i) 
is a symmetric tensor, we continue 

n,n,n n,n,n 

= Yl (g(^) PlP2P3 + gW 3PlP2 )H PiP3 H P2 P *H P3P2 = 2 jr g{f Jl ) pip2P3 H piP3 H$ 2pl H P3P2 . 

P1,P2,P3 = 1 Pl,P2,P3 = l 
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To show the second representation of V 2 G(Diag/i)[ifi, H2] and the general case, when X is not an 
ordered diagonal matrix, is routine. ■ 



7 The Hessian of spectral functions, revisited 

In this last section, we illustrate one more time the machinery developed so far. We recalculate the 
Hessian of a general spectral functions at an arbitrary matrix. 

One of the strengths of the new approach is that one doesn't need to have a preconceived 
notion about the form of the these derivatives. (Recall that in [S| the formula for the Hessian of the 
spectral function was first stated and, then it was proven that is indeed the correct one. The hind 
sight for that formula came from jjj.) Here, we simply differentiate applying the rules developed so 
far to arrive at the correct formula. The approach also clearly shows where the different pieces of 
the Hessian come from. This should make the calculation routine and more clear. 



7.1 Two matrix valued maps 



Let n G M. n — > T(/j) G T > n , be a /i-symmetric, differentiable, 1-tensor- valued map. (In the next 
section, T(fi) = Vf(fi), where / is asymmetric C 2 function.) We define two matrix valued maps 
D T and D{F that play an important role in the description of the Hessian of spectral functions. 
First 

D T(n) = VT(/i), 



or in other words 



1 ■> 







djjL 
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if i x = z 2 , 



Next, define the matrix DiT(fi) as follows 

' 0, 

(VT)^(/i)-(VT)^(/i), if ii ~ i 2 , 



A^2 f^il 



if it </> i 2 , 



where the equivalence relation is with respect to the vector \i. Several of the properties of D{T are 
easily seen from the following integral representations. 

Lemma 7.1 IfT(fj) G T l,n is continuously differentiable, and [i- symmetric map, then for every i\, 
%2 G {!,..., n} we have the representation 



(VT) nn (--- +t(fi i2 -A*ti),"- ,Hn +*(A*u -/Or-")- 
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(VT) n42 (--- ,fi h +t(ii i2 - fin),-- - »A*i 2 +t(lMi ~ {J>h),-- -)dt, 

where the first displayed argument is in position i\ and the second displayed argument is in position 
i 2 . The missing arguments are the corresponding, unchanged, entries of \i. 

Proof. The first case, when %\ = i 2 is immediate. In the second, i\ ~ i 2 implies that /i^ = /ij 2 and 
the integrand doesn't depend on t. In the third case, %\ ^ i 2 , we can compute the integral using 
the Fundamental Theorem of Calculus: 

1 f 1 d 

DiT{n) tin = / — T n (...,fi h +t(fi i2 - fi h ), ...,fi h +t(ii h - /i i2 ),...) dt 

# a ~ Hii Jo dt 

T 1 (..., /ij 2 , . .. , , ...) T 1 (. .. , [ii x , .. ., fii 2 , •• •) 
T 2 (..., , . .. , /ij 2 , ...) T 1 (. .. , , .. ., /ij 2 , •• •) 

where the last equality follows from the fact that T(/i) is /i-symmetric. ■ 



Lemma 7.2 IfT(fi) is differentiable, then both DoT(fi) and DiT(fi) are ^-symmetric maps. 

Proof. The fact that DoT(fi) is /i-symmetric is Lemma 12.11 This implies that if i\ ~ ji, then 
(V'/ T ) Ml (At) = (VT) J1J1 (/i). Also, if % x ~ ji and z 2 ~ j 2 with n ^ z 2 and ji ^ j 2 , then (VT) M2 (/i) = 
( VT) J ' 13 ' 2 . The fact that T is /i-symmetric implies that if i\ ~ j'i, then T n (/i) = T J1 (/i). Now it 
is easy to see that DiT(fi) is /i-symmetric. ■ 

We conclude this section with a summary of the properties of D T(fi) and DiT(fi) 

• For every / = 0, 1, DiT(fi) is a matrix valued, /i-symmetric map. 

• For every I = 0,1, DiT(n) is as smooth as VT(/i). In other words, if VT(/i) is continuous, or 
several times (continuously) differentiable, then so is D;T(/i). 

• In addition, if T = V/(/i) where / : R™ — > R is a symmetric C 2 function, then for every 
/ = 0, 1, DiT(fi) is a symmetric matrix for every fi. 



7.2 / o A is twice (continuously) differentiable if, and only if, / is 

Suppose that / is a symmetric function, twice differentiable at \i G R™. Let E be an arbitrary 
symmetric matrix. Using Formula (|19|) together with Formula (|16|) we compute: 

Um V(/oA)(Diag/i + M m ) - V(/ o A)(Diag/i) 



lim 



M m \\ 

U m (Diag « V/(A(Diag /i + M m ))) t£ - Diag (1) V/(/i) 
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C/ m (Diag«V/Gu + /i m + o(||M m ||)))£/£ - Diag (1) V/(/i) 
Iim 



lim 



\\M m \\ 

y m (Diag (1) (V/(/i) + V 2 /(/i)[/i m ,]+o(||M m ||)))^-Diag (1) V/(/i) 

IIMJI 



= Um U m (^VS(mul - Dia g »V/ M + t/(vVM|ft|)c/ , 

m— >oo |iK/ m | 

For convenience let T = V/(/z), a block- const ant 1-tensor. Using Corollary 12.41 we see that 

llm " nXTl = ( Dla g T ont)W\. 

Denote A\ = DqT, where the operator Do is defined in Section 17.11 Notice that there is a block- 
constant vector b such that A\ — Diag 6 is a block-constant 2-tensor. Using this notation and 
Corollary 12.51 we continue: 

U(V 2 f(fi)[h})U T = U((Ai - Diag6 + Diagb)[h})U T 

= U((A 1 - Di&gb)[h})U T + U((Diagb)[h])U T 

= (Diag (1)(2) (A -Diag6))[Af] + (Diag (12) 6«) [M]. 

This, shows that / o A is twice different iable. 

In order to prove that / o A is twice continuously differentiable we need to reorganize the pieces. 
Let Ai = D\T, where the operator Di is defined in Section 17.11 Notice that the sum A± + Ai is 
block-constant 2-tensor. This means that vector b is (can be chosen) such that A2 + Diag b is 
block-constant, and 

^ 2 + Diag& = T« +&£>. 
Putting everything together we obtain: 

V 2 (/ o A) (Diag n) = Diag (12) T« + Diag (1)(2) (A - Diag b) + Diag (12) &« 
= Diag (1)(2) (^i - Diag b) + Diag (12) (A + Diag b) 
= Diag (1)(2 U 1 + Diag (12 U 2 . 

In the last equality we used the fact that Diag (1)(2) (Diag6) = Diag (12) (Diag6), which is very easy 
to verify. The discussion in jHJ Section 6] shows that 

(37) V 2 (/ o A) (X) = V (Diag (1)(2 U X (A(X)) + Diag (12 U 2 (A(X))) V T , 

where X = V(Diag X(X))V T . 

Moreover, we showed in Section [7"T1 that if / is C 2 , then both A± and Ai are continuous. By 
Proposition 6.2 in [S] it follows that V 2 (/ o A) is continuous. That is, / is C 2 if, and only if , / o A 
is. 
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